[RDST-111] keep tensorboard directory clean during s3 sync #3

jbencook · 2018-01-22T20:22:41Z

This PR is a workaround to make sure TensorBoard picks up logs that get synced in a local directory. The problem seems to be that aws s3 sync creates temporary files while it's downloading them. These are lexicographically greater than the files they are updating (because they append .<some hash> to the end). But TensorBoard stops looking at a file when a new one shows up (explanation here).

rversaw · 2018-01-23T22:28:33Z

src/sagemaker/tensorflow/estimator.py

@@ -38,6 +38,7 @@ def __init__(self, estimator, logdir=None):
        threading.Thread.__init__(self)
        self.event = threading.Event()
        self.estimator = estimator
+        self.aws_sync_dir = tempfile.mkdtemp()


My gut says this should be self._aws_sync_dir, but it's not clear to me why some fields are considered public while others aren't, so it probably doesn't matter.

rversaw · 2018-01-23T22:32:37Z

src/sagemaker/tensorflow/estimator.py

+        """Sync to_directory with from_directory by copying each file in
+        to_directory with new contents. Why do this? Because TensorBoard picks
+        up temp files from `aws s3 sync` and then stops reading the correct
+        tfevent files.


Is there a relevant TensorBoard issue we could include here? I'm just curious if that could be an easier reference for the future in understanding this and knowing if/when it's fixed, versus using git blame to track down this discussion.

This is the closest one I came across, though it's slightly different: tensorflow/tensorboard#70

Yeah I still think it's probably related to tensorflow/tensorboard#349. I'll put that in the docstring.

rversaw · 2018-01-23T23:03:46Z

src/sagemaker/tensorflow/estimator.py

+            from_directory (str): The directory with updated files.
+            to_directory (str): The directory to be synced.
+        """
+        for root, dirs, files in os.walk(from_directory):


I think you'll want to check to see if to_directory exists before this loop, then create it if it doesn't. From what I can tell, running aws s3 sync src_dir dst_dir will create dst_dir for you, but it looks like we're only attempting to create subdirectories of to_directory rather than the directory itself.

jalabort

This looks like a fairly elegant and only Python solution that solves a problem that must be affecting quite a lot of people. We should consider opening a PR upstream.

copy tf files to clean directory for tensorboard

3acbe33

jbencook mentioned this pull request Jan 22, 2018

Tensorboard not displaying scalars aws/sagemaker-python-sdk#26

Closed

rversaw reviewed Jan 23, 2018

View reviewed changes

jalabort assigned jbencook Jan 24, 2018

jalabort approved these changes Jan 24, 2018

View reviewed changes

address comments

3dc4fdc

jbencook merged commit 6acf497 into master Jan 24, 2018

pindinagesh mentioned this pull request Nov 24, 2021

Fails to show new data when event files are replaced (e.g. w/ rsync) tensorflow/tensorboard#349

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RDST-111] keep tensorboard directory clean during s3 sync #3

[RDST-111] keep tensorboard directory clean during s3 sync #3

jbencook commented Jan 22, 2018

rversaw Jan 23, 2018

rversaw Jan 23, 2018

jbencook Jan 24, 2018

rversaw Jan 23, 2018

jalabort left a comment •

edited

Loading

[RDST-111] keep tensorboard directory clean during s3 sync #3

[RDST-111] keep tensorboard directory clean during s3 sync #3

Conversation

jbencook commented Jan 22, 2018

rversaw Jan 23, 2018

Choose a reason for hiding this comment

rversaw Jan 23, 2018

Choose a reason for hiding this comment

jbencook Jan 24, 2018

Choose a reason for hiding this comment

rversaw Jan 23, 2018

Choose a reason for hiding this comment

jalabort left a comment • edited Loading

Choose a reason for hiding this comment

jalabort left a comment •

edited

Loading